extended abstract
A Traditional Approach to Symbolic Piano Continuation
Zhou-Zheng, Christian, Backsund, John, Chan, Dun Li, Coventry, Alex, Eslami, Avid, Goel, Jyotin, Han, Xingwen, Soomro, Danysh, Wei, Galen
Recent developments in sequence modeling have allowed continuation to be viewed as an autore-gressive task, to be modeled with a suitable tokenization scheme and a powerful sequence model like the ubiquitous Transformer [1]. A nonexhaustive list of prior work in this vein includes the Music Transformer [2], Museformer [3], FIGARO [4], and MuseCoco [5]. Most research in symbolic music modeling has so far focused on generalizing these techniques to--and improving performance on--long-sequence, multitrack, multi-instrument, and/or text-or attribute-controllable generative tasks. Typically, specialized techniques must be developed for these foundation models to handle these harder tasks, such as fine-and coarse-grained attention for long sequences [3], and text feature extraction techniques [4] and attribute augmentation [5] for controllability.
Evaluating Fake Music Detection Performance Under Audio Augmentations
Sroka, Tomasz, Wężowicz, Tomasz, Sidorczuk, Dominik, Modrzejewski, Mateusz
ABSTRACT With the rapid advancement of generative audio models, distinguishing between human-composed and generated music is becoming increasingly challenging. As a response, models for detecting fake music have been proposed. In this work, we explore the robustness of such systems under audio augmentations. To evaluate model generalization, we constructed a dataset consisting of both real and synthetic music generated using several systems. We then apply a range of audio transformations and analyze how they affect classification accuracy. We test the performance of a recent state-of-the-art musical deepfake detection model in the presence of audio augmentations.
ChatGPT and U(X): A Rapid Review on Measuring the User Experience
ChatGPT, powered by a large language model (LLM), has revolutionized everyday human-computer interaction (HCI) since its 2022 release. While now used by millions around the world, a coherent pathway for evaluating the user experience (UX) ChatGPT offers remains missing. In this rapid review (N = 58), I explored how ChatGPT UX has been approached quantitatively so far. I focused on the independent variables (IVs) manipulated, the dependent variables (DVs) measured, and the methods used for measurement. Findings reveal trends, gaps, and emerging consensus in UX assessments. This work offers a first step towards synthesizing existing approaches to measuring ChatGPT UX, urgent trajectories to advance standardization and breadth, and two preliminary frameworks aimed at guiding future research and tool development. I seek to elevate the field of ChatGPT UX by empowering researchers and practitioners in optimizing user interactions with ChatGPT and similar LLM-based systems.
MusicGen-Chord: Advancing Music Generation through Chord Progressions and Interactive Web-UI
Jung, Jongmin, Jansson, Andreas, Jeong, Dasaem
MusicGen is a music generation language model (LM) that can be conditioned on textual descriptions and melodic features. We introduce MusicGen-Chord, which extends this capability by incorporating chord progression features. This model modifies one-hot encoded melody chroma vectors into multi-hot encoded chord chroma vectors, enabling the generation of music that reflects both chord progressions and textual descriptions. Furthermore, we developed MusicGen-Remixer, an application utilizing MusicGen-Chord to generate remixes of input music conditioned on textual descriptions. Both models are integrated into Replicate's web-UI using cog, facilitating broad accessibility and user-friendly controllable interaction for creating and experiencing AI-generated music.
AdaCropFollow: Self-Supervised Online Adaptation for Visual Under-Canopy Navigation
Sivakumar, Arun N., Magistri, Federico, Gasparino, Mateus V., Behley, Jens, Stachniss, Cyrill, Chowdhary, Girish
Under-canopy agricultural robots can enable various applications like precise monitoring, spraying, weeding, and plant manipulation tasks throughout the growing season. Autonomous navigation under the canopy is challenging due to the degradation in accuracy of RTK-GPS and the large variability in the visual appearance of the scene over time. In prior work, we developed a supervised learning-based perception system with semantic keypoint representation and deployed this in various field conditions. A large number of failures of this system can be attributed to the inability of the perception model to adapt to the domain shift encountered during deployment. In this paper, we propose a self-supervised online adaptation method for adapting the semantic keypoint representation using a visual foundational model, geometric prior, and pseudo labeling. Our preliminary experiments show that with minimal data and fine-tuning of parameters, the keypoint prediction model trained with labels on the source domain can be adapted in a self-supervised manner to various challenging target domains onboard the robot computer using our method. This can enable fully autonomous row-following capability in under-canopy robots across fields and crops without requiring human intervention.
Learning to Turn: Diffusion Imitation for Robust Row Turning in Under-Canopy Robots
Sivakumar, Arun N., Thangeda, Pranay, Fang, Yixiao, Gasparino, Mateus V., Cuaran, Jose, Ornik, Melkior, Chowdhary, Girish
Under-canopy agricultural robots require robust navigation capabilities to enable full autonomy but struggle with tight row turning between crop rows due to degraded GPS reception, visual aliasing, occlusion, and complex vehicle dynamics. We propose an imitation learning approach using diffusion policies to learn row turning behaviors from demonstrations provided by human operators or privileged controllers. Simulation experiments in a corn field environment show potential in learning this task with only visual observations and velocity states. However, challenges remain in maintaining control within rows and handling varied initial conditions, highlighting areas for future improvement.
Structured Active Inference (Extended Abstract)
We introduce structured active inference, a large generalization and formalization of active inference using the tools of categorical systems theory. We cast generative models formally as systems "on an interface", with the latter being a compositional abstraction of the usual notion of Markov blanket; agents are then 'controllers' for their generative models, formally dual to them. This opens the active inference landscape to new horizons, such as: agents with structured interfaces (e.g. with 'mode-dependence', or that interact with computer APIs); agents that can manage other agents; and 'meta-agents', that use active inference to change their (internal or external) structure. With structured interfaces, we also gain structured ('typed') policies, which are amenable to formal verification, an important step towards safe artificial agents. Moreover, we can make use of categorical logic to describe express agents' goals as formal predicates, whose satisfaction may be dependent on the interaction context. This points towards powerful compositional tools to constrain and control self-organizing ensembles of agents.
A Preliminary Exploration of YouTubers' Use of Generative-AI in Content Creation
Lyu, Yao, Zhang, He, Niu, Shuo, Cai, Jie
Content creators increasingly utilize generative artificial intelligence (Gen-AI) on platforms such as YouTube, TikTok, Instagram, and various blogging sites to produce imaginative images, AI-generated videos, and articles using Large Language Models (LLMs). Despite its growing popularity, there remains an underexplored area concerning the specific domains where AI-generated content is being applied, and the methodologies content creators employ with Gen-AI tools during the creation process. This study initially explores this emerging area through a qualitative analysis of 68 YouTube videos demonstrating Gen-AI usage. Our research focuses on identifying the content domains, the variety of tools used, the activities performed, and the nature of the final products generated by Gen-AI in the context of user-generated content.
Predicting Winning Regions in Parity Games via Graph Neural Networks (Extended Abstract)
Hecking, Tobias, Muthukrishnan, Swathy, Weinert, Alexander
Solving parity games is a major building block for numerous applications in reactive program verification and synthesis. While they can be solved efficiently in practice, no known approach has a polynomial worst-case runtime complexity. We present a incomplete polynomial-time approach to determining the winning regions of parity games via graph neural networks. Our evaluation on 900 randomly generated parity games shows that this approach is effective and efficient in practice. It correctly determines the winning regions of $\sim$60\% of the games in our data set and only incurs minor errors in the remaining ones. We believe that this approach can be extended to efficiently solve parity games as well.
Evaluating the "Learning on Graphs" Conference Experience
Rieck, Bastian, Coupette, Corinna
With machine learning conferences growing ever larger, and reviewing processes becoming increasingly elaborate, more data-driven insights into their workings are required. In this report, we present the results of a survey accompanying the first "Learning on Graphs" (LoG) Conference. The survey was directed to evaluate the submission and review process from different perspectives, including authors, reviewers, and area chairs alike. The first "Learning on Graphs" (LoG) Conference (9-12 December, 2022) was remarkable in more ways than one: starting from scratch, the conference aims to be the place for graph learning research, making use of an advisory committee that consists of international experts in the field. Moreover, at is core, LoG wants to be known for its exceptional review quality.